We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a simple analytical model for inference efficiency to select the best multi-dimensional partitioning techniques optimized for TPU v4 slices based on the application requirements. We combine these with a suite of low-level optimizations to achieve a new Pareto frontier on the latency and model FLOPS utilization (MFU) tradeoffs on 500B+ parameter models that outperforms the FasterTransformer suite of benchmarks. We further show that with appropriate partitioning, the lower memory requirements of multiquery attention (i.e. multiple query heads share single key/value head) enables scaling up to 32x larger context lengths. Finally, we achieve a low-batch-size latency of 29ms per token during generation (using int8 weight quantization) and a 76% MFU during large-batch-size processing of input tokens, while supporting a long 2048-token context length on the PaLM 540B parameter model.
translated by 谷歌翻译
大型语言模型已被证明可以使用少量学习来实现各种自然语言任务的出色表现,这大大减少了将模型调整到特定应用程序所需的特定任务培训示例的数量。为了进一步了解量表对少量学习的影响,我们培训了一个5400亿个参数,密集激活的变压器语言模型,我们称之为“途径”语言模型棕榈。我们使用Pathways在6144 TPU V4芯片上训练了Palm,这是一种新的ML系统,可在多个TPU POD上进行高效的训练。我们通过在数百种语言理解和产生基准的基准方面实现最先进的学习结果来证明扩展的持续好处。在这些任务中,Palm 540B实现了突破性的表现,在一系列多步推理任务上表现出色,超过了最新的最新表现,并且在最近发布的Big Benchmark上表现优于平均人类表现。大量的大型基础任务显示出与模型量表的不连续改进,这意味着当我们扩展到最大模型时,性能急剧增加。 Palm在多语言任务和源代码生成方面也具有很强的功能,我们在各种基准测试中证明了这一点。我们还提供了有关偏见和毒性的全面分析,并研究了训练数据记忆的程度,相对于模型量表。最后,我们讨论与大语言模型有关的道德考虑,并讨论潜在的缓解策略。
translated by 谷歌翻译
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L 2 ) to O(L log L), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
translated by 谷歌翻译
Complete depth information and efficient estimators have become vital ingredients in scene understanding for automated driving tasks. A major problem for LiDAR-based depth completion is the inefficient utilization of convolutions due to the lack of coherent information as provided by the sparse nature of uncorrelated LiDAR point clouds, which often leads to complex and resource-demanding networks. The problem is reinforced by the expensive aquisition of depth data for supervised training. In this work, we propose an efficient depth completion model based on a vgg05-like CNN architecture and propose a semi-supervised domain adaptation approach to transfer knowledge from synthetic to real world data to improve data-efficiency and reduce the need for a large database. In order to boost spatial coherence, we guide the learning process using segmentations as additional source of information. The efficiency and accuracy of our approach is evaluated on the KITTI dataset. Our approach improves on previous efficient and low parameter state of the art approaches while having a noticeably lower computational footprint.
translated by 谷歌翻译
对象检测模型的可靠空间不确定性评估是特别感兴趣的,并且已成为最近工作的主题。在这项工作中,我们回顾了概率回归任务的不确定性校准的现有定义。我们检查了共同检测网络的校准属性,并扩展了最新的重新校准方法。我们的方法使用高斯工艺(GP)重新校准方案,该方案将参数分布作为输出(例如高斯或库奇)。 GP重新校准的使用允许通过捕获相邻样品之间的依赖性来进行局部(条件)不确定性校准。使用参数分布(例如高斯)允许在随后的过程中简化校准的适应性,例如,在对象跟踪范围内进行卡尔曼过滤。此外,我们使用GP重新校准方案来执行协方差估计,该方案允许事后引入输出量之间的局部相关性,例如,对象检测中的位置,宽度或高度。为了测量多元和可能相关数据的关节校准,我们介绍了基于预测分布与地面真相之间的Mahalanobis距离的分位数校准误差,以确定地面真相是否在预测的分位数中。我们的实验表明,与观察到的误差相比,常见检测模型高估了空间不确定性。我们表明,简单的等渗回归重新校准方法足以在校准的分位数方面实现良好的不确定性定量。相反,如果随后的过程需要正常的分布,我们的GP正常重新校准方法将获得最佳结果。最后,我们表明我们的协方差估计方法能够为联合多元校准获得最佳的校准结果。
translated by 谷歌翻译
从神经网络获得的校准置信度估计是至关重要的,尤其是针对安全至关重要的应用,例如自主驾驶或医疗图像诊断。但是,尽管已经研究了有关分类问题的置信度校准任务,但仍缺少有关对象检测和分割问题的详尽研究。因此,我们专注于本章中对象检测和分割模型的置信度校准的研究。我们介绍了多元置信校准的概念,这是对象检测和分割任务的众所周知校准方法的扩展。这允许进行扩展的置信校准,还知道其他功能,例如边界框/像素位置,形状信息等。此外,我们扩展了预期的校准误差(ECE),以测量对象检测和分割模型的错误计算。我们检查了MS Coco以及CityScapes上的几个网络体系结构,并表明鉴于引入的校准定义,尤其是对象检测以及实例分割模型在本质上被误解。使用我们提出的校准方法,我们能够改善校准,从而对分割面罩的质量也产生积极影响。
translated by 谷歌翻译